SingaKids-Mandarin: Speech Corpus of Singaporean Children Speaking Mandarin Chinese
نویسندگان
چکیده
We present SingaKids-Mandarin, a speech corpus of 255 Singaporean children aged 7 to 12 reading Mandarin Chinese, for a total of 125 hours of data (75 hours of speech) and 79,843 utterances. This corpus is phonetically balanced and detailed in human annotations, including phonetic transcriptions, lexical tone markings, and proficiency scoring at the utterance level. The reading scripts span a diverse set of utterance styles, covering syllable-level minimal pairs, words, phrases, sentences, and short stories. We analyze the acoustic properties of Singaporean children. We also observe that while the lack of the neutral tone is the same for Singaporean adults and children, the phonetic pronunciation patterns in these two age groups differ: although Singaporean adults tend to front their retroflex, nasal, and palatal consonants, Singaporean children show both fronting and backing in these consonants. For future work, we plan to develop computer-assisted pronunciation training (CAPT) systems with SingaKids-Mandarin.
منابع مشابه
Parental numeric language input to Mandarin Chinese and English speaking preschool children.
The present study examined the number-specific parental language input to Mandarin- and English-speaking preschool-aged children. Mandarin and English transcripts from the CHILDES database were examined for amount of numeric speech, specific types of numeric speech and syntactic frames in which numeric speech appeared. The results showed that Mandarin-speaking parents talked about number more f...
متن کاملTone production in Mandarin-speaking children with cochlear implants: a preliminary study.
OBJECTIVE More than a quarter of the world's population speak tone languages, such as Mandarin Chinese. In those languages, the pitch or tone pattern of a monosyllabic word conveys lexical meaning. The purpose of this study was to investigate tone production in Mandarin-speaking children with cochlear implants (CIs). MATERIAL AND METHODS Speech samples were recorded from seven normal-hearing ...
متن کاملDevelopment of a corpus of Mandarin sentences in babble with homogeneity optimized via psychometric evaluation.
OBJECTIVE To develop a corpus of sentences in babble noise that is suitable for Mandarin-speaking children. Two experiments were conducted with specific aims of (1) developing sentence material that is grammatically and semantically within the linguistic abilities of children; and (2) improving the efficiency of the test by equalizing the relative intelligibility of individual items in sentence...
متن کاملMonosyllabic Mandarin tone productions by 3-year-olds growing up in Taiwan and in the United States: interjudge reliability and perceptual results.
PURPOSE The author compared monosyllabic Mandarin lexical tones produced by 3-year-old Mandarin-speaking children growing up in Taiwan and in the United States. METHOD Following the procedures in Wong, Schwartz, and Jenkins (2005), the author collected monosyllabic tone productions from 3-year-old Mandarin-speaking children in Taiwan and low-pass filtered them to eliminate lexical information...
متن کاملLanguage- and Talker-dependent Variation in Global Features of Native and Non-native Speech
We motivate and present a corpus of scripted and spontaneous speech in both the native and the non-native language of talkers from various language backgrounds. Using corpus recordings from 11 native English and 11 late Mandarin-English bilinguals we compared speech timing across native English, native Mandarin, and Mandarin-accented English. Findings showed similarities across native Mandarin ...
متن کامل